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FOREWORD 


This  report  embodies  results  of  a  continued  program  of 
research  to  Implement  and  test  methods  of  mathematically  mapping 
speech  signals,  employing  the  theory  and  methods  of  orthogonallzed 
exponentially  damped  sinusoidal  functions.  The  objective  of  this 
program  vas  to  obtain  a  representation  for  speech  signals  that  Is 
optimal  with  respect  to  preservation  of  Information  content,  sim¬ 
plicity  In  Implementation  and  Information  efficiency. 
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ABSTRACT 


Investigations  of  the  emalysis  of  speech  in  terms  of  a  fixed 
exponential  fianction  series  have  been  carried  out.  The  analysis- 
synthesis  processing  was  performed  via  digital  computers  operating 
with  digitized  speech.  The  results  indicate  that  the  representa¬ 
tion  used  is  an  efficient  one  for  speech  waveform  analysis  and 
that  the  information  content  of  the  speech  is  preserved  \dien 
phase  information  is  eliminated.  The  spectral  coefficients  after 
phase  elimination  are  not  found  to  be  an  efficient  representation 
for  the  amplitude  spectrum.  Other  results  show  that  the  method 
of  eunalysis  is  not  limited  to  the  speech  of  one  individual. 
Analytical  studies  indicate  that  it  is  possible  to  optimize 
the  method  of  analysis  to  essentially  perfect  its  efficiency 
for  speech  waveform  analysis.  Normalized,  phase  eliminated, 
spectral  patterns  derived  for  ten  vowel  utterances  by  five  talkers 
indicate  the  feasibility  of  performing  both  autcmatic  vowel  and/ 
or  automatic  speaker  recognition  using  the  orthonoxmal  coefficient 
data. 
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SECTION  1 


INTRODUCTION 


The  Optinnam  Speech  Signal  Mapping  Techniques  program  has  continued  the  study 
euid  evaluation  of  speech  waveform  analysis  methods  initiated  iin.de:r  a  previous 
contract,  No,  AF30(602)-24U6  with  Rome  Air  Development  Center,  Under  that  program, 
analysis-synthesis  of  continuoias  speech  in  terms  of  a  fixed  set  of  orthogonali zed 
exponentially  damped  sinusoidal  functions  was  successfully  demonstrated,,  The  pro¬ 
gram,  herein  reported,  continues  and  extends  the  work  of  the  previous  contract 
throijgh  the  completion  of  the  following  items: 


t 


r 


i 


(1)  Tesxs  were  made,  of  the  correlations  among  the  orthonormal  coefficients 
of  the  fixed  exponential  set  for  a  six  second  sample  of  speech,  using  the 
orthonormal  coefficients  data  available  from  the  previous  contract .  The 
low  cross  correlations  observed  indicate  that  the  fiinction  series  used 

is  an  efficient  one  for  speech  waveforms . 

(2)  Tests  were  also  made  of  the  cross  correlations  among  the  coefficients 
after  phase  information  is  eliminated.  This  refers  to  a  set  of  coeffici¬ 
ents  representing  the  spectrum  magnitude.  The  correlations  in  this  case 
were  substantially  higher.  Indicating  that  the  function  series  used  does 
not  yield  a  particularly  efficient  representation  of  the  amplitude  spec¬ 
trum. 

(3)  A  preliminary  study  of  the  power  spectitim  of  one  set  of  coefficients  was 
made  to  determine  the  feasibility  of  time  averaging  the  coefficients . 

The  resxilts  indicated  that  it  is  feasible  to  time  avereige  the  coefficients 
since  most  of  their  energy  spectnan  lies  below  20  cycles /second., 

(4)  A  test  of  the  ability  of  the  orthogonali zed  exponentially  damped  sinusoids 
to  accurately  represent  the  input  speech  when  the  phase  data  have  been 
eliminated  from  the  representation  was  conducted.  The  results  show  that 
the  speech  is  still  intelligible,  indicating  that  the  phase  elimination 
opeiation  can  be  used  to  reduce  the  data  by  a  factor  of  two  for  speech 
recognition  purposes .  The  discontinuities  at  the  pitch  period  boundaries 
mt'Jte  the  quality  of  the  resynthesized  phase  eliminated  speech  objection¬ 
able  . 

(5)  Analytical  studies,  based  in  part  upon  the  Karhunen-Loeve  expansion 
were  conducted  in  an  effort  to  find  optimum  functional  forms  for  speech 
analysis.  The  results  indicate  that  several  methods  exist  for  deriving 
a  new  fxmction  set  wherein  each  new  function  is  a  weighted  sum  of  the 
basic  exponentially  damped  sinusoids.  The  new  set  would  be  derived 
using  experimental  data  on  coefficient  cross-correlations  and  would  be 
a  more  nearly  optimum  representation  for  speech.  However,  results  to 
date  indicate  that  such  new  sets  probably  will  be  only  slightly  greater 
in  efficiency  than  the  basic  set. 
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(6)  Individual  pitch  periods  taken  from  the  central  portions  of  ten  vowels, 

as  uttered  by  five  different  made  taOkers,  were  anadyzed  auid  resynthesized 
ttsing  orthogonalized  exponentiadly  damped  sinusoids,  nie  results  indicate  ^ 
that  the  analysis -synthesis  method  operates  satisfactorily  on  the  speech 
of  different  tadkers . 

(7)  The  spectral  information  for  the  ten  vowels,  obtadned  from  the  above  auialysis^ 
has  been  normalized  and  plotted  to  indicate  the  different  patterns  exhibited 
for  each  vowel.  While  both  more  quantitative  analysis  and  more  data  would 

be  needed  for  a  final  proof,  it  seems  quite  likely  that  enough  information 
is  contained  in  these  patterns  to  differentiate  the  vowels  in  an  automatic 
speech  recognition  program.  Speadcer  to  speedcer  variations  in  these  pat¬ 
terns  indicate  that  the  orthonormal  coefficient  data  can  be  made  the  banis 
of  automatic  speaker  recognition  paurticulaurly  in  view  of  the  fact  that  the 
speakers  all  were  of  New  England  background. 

nie  discussion  which  follows  contains  the  details  of  the  methods  \ised  in  accom¬ 
plishing  the  taisks  proposed  on  this  program  together  with  those  results  which  can  be 
visually  presented. 


SECTION  2 


DISCUSSION 

f 

2,0  PRELIMINARIES 

It  is  worthwhile  to  present  the  following  brief  review  of  the  basic  smalysis- 
synthesis  methods  being  used  in  this  program  in  order  to  define  the  terms  used  later. 

The  speech  analysis  technique  being  studied  under  this  program  operates  by  section¬ 
ing  the  waveform  into  "smalysis  intervals."  These  intervals  correspond  to  pitch 
periods  for  the  voiced  sounds.  If  S(t)  is  the  speech  waveform,  then  the  waveform 
of  the  q^^  analysis  interval  is 

S  (t)  »  S(t),  t  <  t  <  t  (l) 

q  4  ~  <1+1 

=  0 


so  that 


S(t)  -  ^  S^(t)  .  (2) 

q 

"til 

lAxring  the  q  analysis  inteirval,  the  waveform  is  represented  by  an  approximation 
of  the  form: 


S^(t) 


(3) 


where 


n  =  16,  =  2«k  200,  K  »  1,2,  •••,16  . 

20 


(h) 
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Since  the  A*  and  t)^  are  difficult  to  ccopute  for  a  miniminn  mean  square  error 
approximation,  the  analysis -synthesis  process  is  actually  performed  in  terms  of  an 
orthogonalized  version  of  the  exponentially  damped  fiuictions  in  Eq.  (3).  Denote 
this  set  of  orthonormal  functions  hy  •  We  then  write 


(5) 


u-1 


where 


-  t^)  dt 


and  the 


have  the  orthonormal  property  over  0  <  t  <  oo  : 


00 

dt  -  1  u  -  V 

o 


-  0  u  jf^  V 


(6) 


(7) 


H.e  oj  are  referred  to  as  the  orthonormal  coefficients  of  the  waveform 
S^(t).  Details  of  the  orthogonalization  process  are  given  in  Reference  1.  The 
orthogonalized  functions  occur  in  pairs.  Both  memhers  of  such  a  pair 

have  the  same  amplitude  spectrum,  i.e.,  they  differ  only  in  their  phase  spectra. 
The  series  in  Eq.  (^)  can  be  rewritten  to  place  these  pairs  in  evidence  as  follows: 


S^(t)  « 


r2v-i 


^2v-l^* 


2v 


(6) 


A  "phase  elimination"  operation  that  is  analogous  to  the  same  operation  in 
a  Fourier  sine-cosine  series  ceui  be  carried  out  giving: 


4 


where 


The  are  referred  to  as  the  phase  eliminated  orthonormal  coefficients  of  the 

V 

waveform  S  (t). 

q 

The  functions  used  in  Eq,  (9)  could  as  well  have  been  the  odd  ordered  functions 

Another  phase  eliminated  resynthesis  which  yields  smaller  discontinuities  at 

the  times  t  =  t  (pitch  period  boundaries)  in  the  resynthesized  speech  is  given  by: 

<1 

nj2 

S"(t)  .  ^  (.1)'"^ 

'  v-1 

^  The  details  of  the  methods  by  which  digital  computers  are  used  to  process  the 
speech  to  obtain  S  (t),  S'(t)  and  S”(t)  are  discussed  in  Reference  1  and  in  Appendix  III. 

<1  q  q 

2 , 1  ANALXSIS-SYWrifflSIS 

At  the  outset  of  the  six  month  research  program,  which  is  the  subject  of  this 
report,  a  six  second  sample  of  connected  speech  comprising  the  two  sentences  "Joe 
took  father's  shoe  bench  out.  She  is  waiting  at  my  lawn.",  had  eLlready  been  analyzed 
via  Eq,  (6)  and  resynthesized  via  Eqs .  (5)  and  (2),  (see  Reference  l).  This  demon¬ 
strated  the  validity  of  the  partic\ilar  series  representation  described  in  Reference  1. 
poring  the  current  program,  the  same  speech  sample  was  also  resynthesized  using  the 
same  coefficient  data  derived  by  Eq.  (6),,  but  with  the  "phase"  information  elimin¬ 
ated  via  Eq.  (lO) . 

t 

This  operation  was  performed  to  show  that,  by  phase  elimination,  the  ninnber  of 
coefficients  used  to  represent  a  given  sound  could  be  cut  in  half,,  This  is  based 


upon  the  facts  that  the  odd  and  even  pairs  differ  only  in  their 

phase  spectra  and  that  the  ear  is  relatively  insensitive  to  changes  in  the  phase 
spectrum  of  a  quasi -periodic  vave.  Both  the  resynthesis  using  Eq.  (9)  for  n  ■  32,  ^ 

24,  and  I6,  and  that  using  Eq.  (ll)  for  n  «  32  were  performed.  Since  all  of  the 
orthonormal  functions,  in  the  set  being  used,  have  positive  values  at  t  »  0,  and 
since  all  of  the  coefficients  are  positive,  a  positive  initial  value  for  each 
pitch  period  resynthesized  via  Eq.  (9)  results.  The  alternating  sign  resynthesis 
of  Eq.  (11)  reduces  these  discontinuities.  An  example  of  the  result  for  a  single 
pitch  period  is  shown  in  Figure  1.  Note  that  the  resynthesis  via  Eq.  (ll)  reduces 
but  does  not  eliminate  the  discontinuity  at  the  start  of  the  pitch  period. 

A  listening  test  indicates  that  the  phase  eliminated  speech  is  about  as  in¬ 
telligible  as  the  non-phase  eliminated  speech,  but  that  the  quality  of  the  speech  is 
objectionable  because  of  the  sharp  discontinuities  between  pitch  periods. 

This  effect  is  less  pronoionced  but  is  still  objectionable  in  the  resynthesis 
using  alternating  coefficient  signs.  (See  Eq.  (ll).} 

In  order  to  show  that  the  analysis -synthesis  operations  described  above 
preserve  the  formant  patterns  in  the  speech  sample,  vide  band  sound  spectrograms 
were  made  of  2.1+  second  portions  of  the  processed  speech.  The  spectrograms  are  », 

made  from  the  words  "Joe  took  father's  shoe  ben***.”  Figure  2a  shows  the  spectrogram 
for  the  speech  sample  that  was  low  pass  filtered  to  3OOO  cycles  per  second  band¬ 
width,  and  subjected  to  analog  to  digital  and  digital  to  analog  conversion.  * 

Figure  2b  shows  a  spectrogram  for  the  resynthesized  speech  using  the  32  function  or¬ 
thonormal  expansion.  The  spectrogram  in  Figtire  2c  was  made  from  the  phase  elimin¬ 
ated  resynthesized  speqch  sample  using  Eq.  (ll).  The  formant  pattern  of  the  original 
speech  is  seen  to  have  been  recreated  in  the  resynthesized  versions.  Ihe  discon¬ 
tinuities  between  pitch  periods  tend  to  "fill  in"  areas  that  are  blank  in  the 
original  speech.  This  is  especially  true  in  Figure  2c  where  the  phase  elimination 
process  causes  greater  discontinuities  to  occur. 

2 .2  EFFICIENCr  OF  THE  ORTHOHDIWAL  SERIES  REPRESEMTATION  » 

It  is  of  Interest  to  determine  whether  the  orthonoijnal  series  that  we  are  using 
yields  the  most  accurate  possible  description  of  the  speech  signal  for  a  given  number  . 

of  terms  in  the  series.  Given,  that  the  orthonormal  coefficients  have  been  shown  to 
yield  a  complete  description  of  the  speech,  one  may  use  the  correlation  coefficients 
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Figure  1.  Phase  Eliminated  Speech  Resynthesis 


among  the  various  coefficients  as  a  measure  of  whether  or  not  each  coefficient 
yields  an  independent  measirrement  on  the  speech  signal.  The  most  desirable  situa¬ 
tion  is  given  by: 


P 


uv 


1 


U  =  V 


=  0 


U  j/  V 


(12) 


It  is,  of  course,  true  that  the  linear  independence  implied  by  Eq.  (is)  does  not, 
in  general,  prove  statistical  independence,  but  \inder  the  circumstances  this  is 
the  most  reliable  indication  that  is  readily  available.  Accordingly,  the  data 
available  frcan  the  previous  analysis  of  six  seconds  of  connected  speech  (which  in¬ 
cluded  nearly  all  of  the  commonly  occurring  phonemes  of  engllsh)  were  used  to  can- 
pute 


eRc  -  C  )(C  -  C  )1 

L‘  u  u  ^  V  _ V  J 


uv 


a  a 
u  V 


(13) 


where  E  [  J  indicates  the  expected  value,  the  over  bar  indicates  the  mean  value, 
.  and 


'u 


1/2 


and 


u  =  l,2,3>’"*;  32 
V  •■=  1,2,3,  32  . 

All  averages  are  over  the  total  number  of  pitch  periods .  Hie  matrix  whose 
elements  are  the  is  given  in  Figure  3”  The  cross -correlations  indicate 
that  the  representation  is  an  efficient  one  for  speech,  since  only  about  26 
percent  of  the  correlation  coefficients  are  greater  than  0.25,  and  only  1.6  per¬ 
cent  of  the  correlation  coefficients  are  O.5O  or  greater. 

Hie  same  type  of  cross -correlation  analysis  was  conducted  using  the  phase 
eliminated  orthonormal  coefficients  (see  Eq.  lO).  In  this  case  the  object  is  to 
determine  whether  the  B  represent  essentially  independent  measurements  on  the  magni¬ 
tude  spectrum  of  the  speech.  The  cross -correlation  coefficients  are  given  by 
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Figure  3.  Correlcrion  Matrix  of  Orthonormal  Coefficients 


Ef(B  -  B  )(B  -  B  )1 

L  V  u  u'J 


a  a 

H  V 


(lU) 


where 

1/2 


and 


n  =  1,2,  •  •  *,16,  V  =  1,2,  •  •  ‘,16  . 

Hie  matrix  of  correlation  coefficients  is  shown  in  Figure  li-.  The  cross -correlation 
coefficients  are  generally  much  higher  than  those  of  Figure  3r  indicating  that 
measurement  of  the  B^  is  not  a  particularly  efficient  method  of  describing  the 
amplitude  spectrum. 

2.3  APPROXIMATING  OPTIMUM  ORTHONORMAL  FUNCTIONS  FOR  SPEECH  ANALYSIS 

One  of  the  goals  of  the  present  program  is  the  optimization  of  a  set  of  ortho- 

t 

normal  functions  for  the  purpose  of  speech  eunalysis .  The  desired  properties  are 
(a)  minimum  mean  square  truncation  error,  and  (h)  linearly  independent  coefficients. 

,  The  Karhxmen-Loeve  theorem  (see  Reference  1  and  2  of  Appendix  II )  indicate  that 
these  properties  are  provided  hy  the  eigenfunctions  of  the  homogeneous  Fredholm 
equation  whose  kernel  is  the  covariance  characteristic  of  the  speech  process.  It 
is  unlikely  that  an  exact  solution  of  this  problem  will  be  obtained.  This  is  be¬ 
cause  it  appears  to  be  impossible  to  define  an  experiment  that  would  yield  meaning¬ 
ful  data  for  the  specification  of  a  covariance  fimction  for  speech.  The  reader  is 
referred  to  Appendix  II  for  a  more  extensive  discussion  of  these  difficulties,  as 
well  as  detailed  mathematical  treatment  of  the  work  summarized  below. 

As  a  result  of  the  theoretical  work  carried  out  on  this  problem,  it  is  possible 
.  to  define  two  methods  of  solution  of  the  problem  of  obtaining  an  optimal  function  set 
Both  methods  start  from  a  basic  Fourier  type  analysis  using  an  arbitrary  function  set 
(in  the  speech  waveform  analysis  case,  the  basic  fianction  set  covild  be  the  ortho- 
gonallzed  exponentially  damped  sinusoids,  since  they  have  already  been  shown  to  be 
efficient  for  speech  and  also  have  the  convenient  property  of  being  able  to  handle 
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Figure  4.  Correlation  Matrix  of  Phase  Eliminated  Orthonormal  Coefficients 


arbitrary  length  pitch  periods.)  Using  the  experimentally  measured  cross -correlations 

between  the  orthonormal  coefficients  (essentially  am  tin-normalized  version  of  the 

•  » 

data  given  in  Figure  2)j  a  new  set  of  functions  is  ccmputed.  Each  of  the  new  func¬ 
tions  is  an  explicit  lineaur  summation  of  the  old  functions  wherein  the  weights  aure 
.  computed  (using  the  data  of  Figure  2),  such  that  the  statistical  averaige  of  the 
cross -correlations  between  coefficients  of  the  optimized  functions,  haus  been  minimized 
(see  Appendix  II,  Eq.  (17)). 

The  optimized  functions  may  be  constrained,  if  desired,  to  be  orthonormal 
(Appendix  II)  in  which  cetse  the  resulting  solutions  are  equivalent  to  solutions  of 
the  Kairhxmen-Iioeve  equation  (Appendix  II ). 

On  the  other  hand, the  requirement  for  orthonormal ity  cam  be  relaxed.  In  this 
case,  explicit  expressions  for  the  optimum  weights  are  not  obtained,  but  insteaui, 
the  problem  of  minimization  of  a  matrix  involving  the  cross -correlations  between 
orthonormal  coefficients  (of  the  basic  series)  must  be  solved.  This  latter  method, 
although  not  explicit,  may  be  capable  of  producing  a  function  set  that  will  yield 
a  truncation  error  as  low  or  lower  tham  the  ad*orementioned  explicit  method. 

Using  the  theory  developed  in  Appendix  II,  and  the  available  experimental  data 
‘  described  in  Section  2.2  of  this  report,  it  will  be  possible  to  determine  how  much 
of  am  improvement  in  efficiency  over  the  basic  orthogonalized  exponentially  damped 
sinusoids  can  be  obtained  throu^  the  use  of  optimma  lineam  wel^ting. 

2.4  ENERGY  SPEUTRUM  OF  THE  ORTHONORMAL  COEPFICIEIITS 

To  the  extent  that  the  orthonormal  coefficients  are  a  spectreQ.  descripticn  of 
the  speech  signal,  i-t>may  be  expected  that  the  rate  of  variation  of  the  quantities 
|b^|  will  be  limited  by  the  rate  of  variation  of  the  sjiectrum  controlling  mecham- 
ism  —  the  vocal  tract.  The  results  of  a  hand-computed,  octave  band,  spectral.  aneOy- 
sis  of  a  Oo45  second  record  of  ame  shown  in  Figure  Figure  5  shows  that,  for 
the  peu:^iculatr  phase  eliminated  coefficient  considered,  there  is  relatively  little 
energy  at  rates  above  I8  cycles  per  second.  Iliis  value  is  typical  of  the  paurameter 
bamdwidths  that  have  been  fovind  necessauy  in  speech  auuilysls-synthesls  systems  such 
am  the  chamnel  Vocoder. 
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2 .5  TESTS  OF  THE  ANALYSIS  METHOD  ON  SPEECH  SAMP.IES  TAKEN  PROM  SEVERAL  TAIKERS 

One  of  the  limitations  of  the  previous  tests  of  speech  autialys is -synthesis 
\islng  orthogonal! zed  exponentially  damped  sinusoids  has  been  removed,  during  the 
current  program,  by  making  tests  on  the  speech  of  five  male  talkers .  Two  samples 
’  of  each,  of  the  ten  words;  he,  hid,  head,  had,  hub,  her,  ah,  awe,  hood,  who,  were 
recorded  from  each  of  five  speakers  and  digitized  at  a  sampling  rate  of  12,000  sam¬ 
ples/second  with  10  bits  per  sample  accuracy.  Plots  were  made  of  the  waveforms 
of  these  words  so  that  individual  pitch  periods  could  be  selected  for  analysis .  A 
pitch  period  from  the  central  portion  of  the  vowel  in  each  of  the  100  digitized  words 
was  selected  for  smalysis .  These  100  pitch  periods  were  then  analyzed  using  Eq,  (6), 
the  full  32  term  orthonormal  function  series .  The  orthonormal  coefficients  were 
saved  for  later  analysis.  These  coefficients  and  the  function  set  were  used  to 
resynthesize  least  mean  square  error  approximations  to  the  original  waveforms. 

Fifty  of  these  pitch  period  curves  (one  for  each  of  ten  vowels  for  each  of  five 
speakers)  are  shown  in  Appendix  I,  These  provide  ample  opportunity  for  comparison 
of  original  and  resynthesized  curves,  and  show  that  the  particular  choice  of  func¬ 
tions  and  parameters  used  in  the  analysis  work  of  this  program  are  not  limited  in 
applicability  to  any  particular  speaker. 

2 .6  PATTERNS  OF  PHASE  ELIMINATED  OFTHONDRMAL  COEFFICIENTS  FOR  TEN  VOWEIB 

*  The  orthonormal  coefficient  data  for  the  five  speakers  (as  described  in 

Section  2.4)  were  used  to  compute  the  phase  eliminated  orthonormal  coefficients 
as  given  by  Eq.  (lO)  of  Section  2.0.  The  coefficients  for  each  of  the  two  repeti¬ 
tions  by  each  spesiker  of  each  word  were  normalized  and  then  averaged.  These  norm- 
eLLized  and  averaged  coefficients  are  plotted  in  Figures  6  throu^  15  for  the  ten 
words  as  described  above  in  Section  2.4.  Each  of  these  figures  contains  the  coef¬ 
ficient  pattern  for  one  pitch  period  of  the  central  vowel  of  the  given  word  for  each 
of  the  five  talkers .  Examination  of  these  patterns  shows  that  different  patterns 
are  indeed  obtained  for  the  different  vowels.  Thus,  the  data  indicate  that  such 
-  coefficient  patterns  would  be  useful  input  data  for  a  speech  recognition  machine . 

It  should  be  noted  that  the  restilts  shown  are  for  single  pitch  periods,  and 
that  in  a  pracxicai  situation  more  reliable  results  woxild  probably  be  obtained  by 
using  data  obtained  from  several  adjacent  pitch  periods  in  each  soiind  to  be  recog¬ 
nized. 
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10.  Ortho 


Figure  11.  Orthonormal  Spectral  Pattern  for  her  (Vowel) 


I  Spectral 


Figure  14.  Orthonormol  Spectrol  Pattern  for  hood  (Vowel) 
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The  peaks  in  the  patterns  of  Figures  6  through  I5  are  generally  correlated 
with  formant  locations  which  are  the  basic  parameters  associated  with  the  articula¬ 
tory  configuration  for  each  vowel. 

Speaker  to  speaker  variations  are,  of  course,  also  evidenced  in  the  patterns 
shown.  These  variations,  of  course,  always  reduce  the  reliability  of  autcanatic 
speech  recognition  logics.  On  the  other  hand,  the  speaker  related  variations 
evidenced  in  Figures  6  thro\igh  1^  may  also  be  exploited  for  speaker  recognition 
applications . 

Of  course,  these  data  must  be  taken  only  as  preliminary  indications  since  an 
analysis  of  variance  of  large  qiiantltles  of  data  (than  could  be  collected  within  the 
scope  of  the  current  limited  program)  would  be  needed  to  produce  definitive  results 
as  to  the  reliability  of  a  recognition  process  using  this  type  of  data. 
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SECTION  3 


CONCLUSIONS 


Die  following  conclusions  can  be  drawn  frcm  the  results  of  the  current  six 
mpnth  program  of  speech  analysis  research. 

(1)  Die  orthogonalized  exponentially  damped  sinusoidal  set  using  sixteen 
fixed  frequencies  and  fixed  damping  at  each  frequency  is  a  valid  re¬ 
presentation  for  pitch  synchronous  analysis  of  speech  signals  independ¬ 
ent  of  the  individual  speaiker. 

(2)  The  above  representation  is  efficient  for  speech  waveform  analysis  in  the 
sense  that  the  experimentally  derived  cross-correlations  between  ortho¬ 
normal  coefficients  are  small. 

(3)  The  exponentially  damped  sinusoidal  representation  is  less  efficient 
for  speech  energy  Spectra  than  for  the  waveform  itself.  This  is  evi¬ 
denced  by  the  fact  that  the  cross-correlations  among  the  phase  eliminated 
coefficients  are  more  highly  cross-correlated  than  the  basic  orthonorraal 
coefficients  themselves.  The  spectrum  representation  can  be  made  more 
efficient  either  by  linear  coding  techniques  or  by  modifying  the  analysis 
process  so  that  the  Fourier  coefficients  of  the  energy  spectrum  are  derived 
directly  by  means  of  optimized  autocorrelation  emalysis. 

(4)  The  energy  spectrum  of  the  phase  eliminated  orthonormal  coefficients  derived 
by  the  method  of  Fourier  analysis  in  terms  of  exponentially  damped  sinusoids 
has  been  found  to  be  essentially  limited  to  frequencies  below  20  cycles /second. 
Thus^  in  the  case  of  many  speech  sounds,  it  is  feasible  to  time  average  the 
coefficient  spectral  data  without  destroying  any  of  the  fundamental  phonemic 

•  inf orma lion. 

(5)  The  elimination  of  spectral  phase  information,  in  the  generalized  Fourier 
representation  used  in  these  studies,  does  not  significantly  reduce  the 
intelligibility  in  an  analysis-synthesis  test.  This  process  does,  however, 
seriously  reduce  the  quality  of  the  resynthesized  speech  heard  by  a  human 
listener. 

(6)  Theoretical  studies  have  shown  that  it  is  possible  to  derive  a  new  set  Of 
orthonormal  functions,  based  on  the  orthogonalized  exponentially  damped 
sinusoids,  wherein  the  coefficients  will  be  linearly  independent.  Such 

a  derived  set  of  functions  would  consist  of  weighted  simis  of  the  original 
base  functions  and  would  be  optimally  efficient  in  the  sense  that  the  re¬ 
presentation  would  provide  a  minimum  expected  meem  square  error  for  a 
'  given  nxanber  of  terras  in  the  series. 

(7)  The  "phase  eliminated"  spectinim  patterns  for  ten  vowels  as  uttered  by  five 
male  speakers  have  been  measured  and  these  results  indicate  that  it  is 
feasible  to  perform  either  vowel  recognition  or  speaker  recognition  using 
the  orthonorraal  coefficients  as  input  data.  In  the  case  of  speaker  recog¬ 
nition  it  is  to  be  noted  that  all  five  speakers  were  of  New  England  back¬ 
grounds. 
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APPENDIX  I 


WAVEFORMS  OF  CENTRAL  PITCH  PERIODS  IF  TEN  VOWEL  UTTERANCES  BY  FIVE  TAIiCERS 
RESYNTHESIZED  IN  TERMS  OF  FIXED  ORTHOGONALIZED  EXPONENTIALLY  DAMPED  SINUSOIDS 

The  fifty  curves  in  Appendix  I  are  designated  in  terras  of  file  numbers,  1 
throiigh  5,  and  pitch  period  numbers,  1  through  10.  Ihe  file  number  refers  to  the 
individual  speaker  and  the  pitch  period  number  refers  to  a  central  pitch  period 
of  the  vowel,  portion  of  one  of  ten  vowels.  The  five  speakers  are  all  mature 
males  of  New  England  state  background.  The  words  are  designated  as  follows; 


01 

he 

02 

hid 

03 

head 

04 

had 

05 

hub 

06 

her 

07 

08 

awe 

09 

hood 

10 

who 

Example:  The  Designation 

File  number  05  pitch  period  number  01  refers  to  speaker  number  5  and  the 
vowel  of  the  word  he. 
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FILE  NO.  I  PITCH  PERIOD  NO.  01  —  TIME  OF  FIRST  PITCH  POINT  0)0  SECONDS,  234  MILLISECONDS 


1-2 


FILE  NO.  1  PITCH  PERIOD  NO.  02  -TIME  OF  FIRST  PITCH  POINT  012  SECONDS,  779  MILLISECONDS 


/ 


s 

k 


FILE  NO.  1  PITCH  PERIOD  NO.  08  —  TIME  OF  FIRST  PITCH  POINT  030  SECONDS,  6X  MILLISECONDS 


I- 


FILE  NO.  I  PITCH  PERIOD  NO.  09  —  TIME  OF  FIRST  PITCH  POINT  033  SECONDS,  436  MILLISECONDS 


1-10 


FILE  NO.  I  PITCH  PERIOD  NO.  10  —  TIME  OF  FIRST  PITCH  POINT  036  SECONDS,  221  MILLISECONDS 


1-11 


« 


file  no.  2  PITCH  PERIOD  NO.  04  —  TIME  OF  FIRST  PITCH  POINT  033  SECONDS,  717  MILLISECONDS 


1-15 


FILE  NO.  2  PITCH  PERIOD  NO,  05  —  TIME  OF  FIRST  PITCH  POINT  035  SECONDS,  581  MILLISECONDS 


i-l6 


# 


FILE  NO.  2  PITCH  PERIOD  NO.  06  —  TIME  OF  FIRST  PITCH  POINT  037  SECONDS,  665  MILLISECONDS 


1-17 


•t 


FILE  NO.  2  PITCH  PERIOD  NO.  10  —  TIME  OF  FIRST  PITCH  POINT  045  SECONDS,  441  MILLISECONDS 


1-21 


FILE  NO.  3  PITCH  PERIOD  NO.  01  —  TIME  OF  FIRST  PITCH  POINT  020  SECONDS,  864  MILLISECONDS 


1-22 


/  < 


FILE  NO.  3  PITCH  PERIOD  NO.  03  -  TIME  OF  FIRST  PITCH  POINT  025  SECONDS,  251  MILUSECONDS 


1-24 


i 


FILE  NO,  3  PITCH  PERIOD  NO.  M  —  TIME  OF  FIRST  PITCH  POINT  027  SECONDS,  753  MILLISECONDS 


1-25 


FILE  NO.  3  PITCH  PERIOD  NO-  05  —  TIME  OF  FIRST  PITCH  POINT  030  SECONDS,  096  MILLISECONDS 


1-^6 


FILE  NO.  3  PITCH  PERIOD  NO.  07  —  TIME  OF  FIRST  PITCH  POINT  036  SECONDS,  647  MILLISECONDS 


1-28 


FILE  NO.  3  PITCH  PERIOD  NO.  08  -  TIME  OF  FIRST  PITCH  POINT  041  SECONDS,  937  MILLISECONDS 


1-29 


f  • 

^  « 

•  4 

>  >■ 


A 


\  • 


t 


y 


FILE  NO.  3  PITCH  PERIOD  NO.  09  —  TIME  OF  FIRST  PITCH  POINT  044  SECONDS,  597  MILLISECONDS 


1-30 


FILE  NO.  3  PITCH  PERIOD  NO.  10  —  TIME  OF  FIRST  PITCH  POINT  046  SECONDS,  940  MILLISECONDS 


1-31 


n 


>• 
•  • 


FILE  NO,  4  PITCH  PERIOD  NO.  01  —  TIME  OF  FIRST  PITCH  POINT  050  SECONDS,  344  MILLISECONDS 


1-52 


4. 


•  • 


f 

4 


• 

•  ✓ 


FILE  NO.  4  PITCH  PERIOD  NO.  02  —  TIME  OF  FIRST  PITCH  POINT  053  SECONDS,  21 1  MILLISECONDS 


1-53 


♦ 

*  • 

•  » 

FILE  NO.  4  PITCH  PERIOD  NO.  03  —  TIME  OF  FIRST  PITCH  POINT  056  SECONDS,  077  MILLISECONDS 


•54 


«  « 

•  • 
•  • 
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FILE  NO.  4  PITCH  PERIOD  NO.  04  —  TIME  OF  FIRST  PITCH  POINT  059  SECONDS,  686  MILLISECONDS 
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FILE  NO.  4  PITCH  PERIOD  NO.  05  —  TIME  OF  FIRST  PITCH  POINT  062  SECONDS,  933  MILLISECONDS 
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FILE  NO.  4  PITCH  PERIOD  NO.  06  —  TIME  OF  FIRST  PITCH  POINT  066  SECONDS,  502  MILLISECONDS 
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FILE  NO.  4  PITCH  PERIOD  NO.  07  —  TIME  OF  FIRST  PITCH  POINT  070  SECONDS,  592  MILLISECONDS 


• 


.  • 


/» 


0 


FILE  NO.  4  PITCH  PERIOD  NO.  06  —  TIME  OF  FIRST  PITCH  POINT  074  SECONDS,  181  MILLISECONDS 
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FILE  NO.  4  PITCH  PERIOD  NO.  09  —  TIME  OF  FIRST  PITCH  POINT  077  SECONDS,  950  MILLISECONDS 
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•  * 
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FILE  NO.  4  PITCH  PERIOD  NO.  10  -  TIME  OF  FIRST  PITCH  POINT  081  SECONDS,  277  MILLISECONDS 
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FILE  NO.  5  PITCH  PERIOD  NO.  01  —  TIME  OF  FIRST  PITCH  POINT  017  SECONDS,  899  MILLISECONDS 

I-s42 


Jl> 
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FILE  NO.  5  PITCH  PERIOD  NO.  02  —  TIME  OF  FIRST  PITCH  POINT  019  SECONDS,  300  MILLISECONDS 
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t 

•  • 

i-: 


y  : 


FILE  NO.  5  PITCH  PERIOD  NO.  03  —  TIME  OF  FIRST  PITCH  POINT  021  SECONDS,  101  MILLISECONDS 
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FILE  NO.  5  PITCH  PERIOD  NO.  04  —  TIME  OF  FIRST  PITCH  POINT  023  SECONDS,  143  MILLISECONDS 


\ 


FILE  NO.  5  PITCH  PERIOD  NO.  05  —  TIME  FIRST  PITCH  POINT  025  SECONDS,  064  MILLISECONDS 
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FILE  NO.  5  PITCH  PERIOD  NO.  06  —  TIME  OF  FIRST  PITCH  POINT  027  SECONDS,  926  MILLISECONDS 
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FILE  NO.  5  PITCH  PERIOD  NO.  07  —  TIME  OF  FIRST  PITCH  PONT  030  SECONDS,  828  MILLISECONDS 


W8 


FILE  NO.  5  PITCH  PERIOD  NO.  08  —  TIME  OF  FIRST  PITCH  POINT  032  SECONDS,  789  MILLISECONDS 


FILE  NO.  5  PITCH  PERIOD  NO.  09  -  TIME  OF  FIRST  PITCH  POINT  034  SECONDS,  750  MILLISECONDS 
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»• 


FILE  NO.  5  PITCH  PERIOD  NO.  10  —  TIME  OF  FIRST  PITCH  POINT  036  SECONDS,  872  MILLISECONDS 
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APPENDIX  II 


A  METHOD  FOR  APPROXIMATING  OPTIMUM  ORTHOGONAL  TUNCTIOIB 
FOR  SPEECH  ANALYSIS 

r 


II.  0  INTRODUCTION 

One  of  the  goals  of  our  Optimum  Speech  Program  is  the  determination  of  an 
"optimum  set  of  orthonormal  functions,"  where  optirntm  refers  to  the  properties  of 
the  associated  generalized  Fourier-series  expansions  of  speech  signals.  The  desired 
properties  are  (a)  minimum  truncation  error  (in  the  mean-square  sense)  and  (b) 
linearly  independent  coefficients  (in  the  statistical  sense).  The  Karhunen-Loeve 
theorem^^^  and  the  work  of  K.L.  Jordan^^^  indicate  that  these  properties  are  pro¬ 
vided  by  the  eigenfunctions  of  the  homogeneous  Fredholm  equation  (arranged  in  order 
of  decreasing  eigenvalues)  whose  kernel  is  the  covariance  function  characteristic 
of  the  speech  process  (refer  to  Eq.  (II-7)).  However,  an  exact  solution  of  this  prob 
lem  requires  the  specification  of  a  meaningful  covariance  fmiction. 

To  date,  insufficient  data  frcan  which  to  compute  or  even  determine  the  exist¬ 
ence  of  a  covariance  function  is  available.  Indeed,  one  is  hard-pressed  to  define 
in  advance  what  would  constitute  meaningful  data,  in  view  of  speaker,  regional, 
sixuual,  age,  contextual,  syntactic,  emotional,  etc.,  variations.  Hence,  this 
problem  cein  be  solved  only  in  a  limited  but  hopefully  non-trivial  sense. 

,  The  approach  taken  here  utilizes  experimentally-determined  covarieuit  properties 

given  in  terms  of  the  coefficients  of  an  arbitrary  Fourier  expansion  of  a  speech 
sample  as  the  data  to  be  used  to  determine  the  "optimum"  function  set  —  optimum, 
at  least  in  the  sense  discussed  in  the  last  paragrajh  for  that  particular  speech 
sample  used  in  the  computation  of  the  Fourier  coefficients .  Since  explicit  ex¬ 
pressions  for  the  new  functions  in  terms  of  the  old  functions  are  given  (in  fact, 
as  linear  combinations)  the  possibilities  for  and  errors  involved  in  truncating 
the  new  series  are  immediately  available. 

II.  1  IMPLICIT  SOLUTION 

The  initial  Fourier  series  expansion  is  given  by 
»  N 

x(t)  =  L.I.M.  )  C  ^  (t),  a  <  t  <b  (ll-l) 

n..oo4;"" 

n«l 


II-l 


where 


E(C  C  )  /  K  6 
'  n  m  tun 


(11-2) 


U 

I  ^  (t)  ^  (t)  dt  - 


(n-3) 


The  desired  series  expans  loci  Is 


c(t) 


M 

L.I.M.  \  a  <  t  <  b 

M  ^  00  _~L 


(U-4) 


where 


a  )  «  6 

mn 


(II-5) 


D 

I 


♦  (t)t  (t)  dt  ■  6 
^m'  '  BUI 


(lJ-6) 


The  Karhunen>Loeve  Integral  equation  satisfied  by  the  t  (t)’s  Is^^^ 


XT  t  (t) 

u  HI 


o 

I 


K(t,s)  tjn(s)  ds 


(11-7) 


where,  according  to  Mercer's  Theorem, 


(1) 


M 


K(t,s)  ■  E(x(t)x(s))  ■  L.I.M.  Y  ^«t_(t)t_(8) 

M  -  00  ®  ® 

m«l 


(11-8) 


II-2 


In  teims  of  Eq.  (ll-l),  Eq.  (lI-8)  may  also  be  written  as 


N  N 


K(t,s)  =  L.I.M.  e[  y  y  C  C.  s^^(t)^,(s) 
N  -  00  ^  i 


n=l  J=1 


N  N 


L.i.M.  y  y  ^(t)4As)Mcc  ) 


(II-9) 


n=l  J=1 


The  last  manipulation  in  Eq.  (lI-9)  must  hold  if  we  ass\ane  that  x(t)  is  a  second-order 

random  process^^^  because  then  all  second  moments,  E(C  C.jiJ  (t)^.(s))=^  (t)fi  (s)e(C  C.) 

exist.  Consequently,  each  term  on  the  right-heuid  side  of  Eq. (lI-9)  is  equivalent  to  a 

sianmable  function  in  the  relevant  Joint  probability  meeisure  space  and,  as  a  result 

(3) 

the  summation  and  integration  operations  can  be  interchanged. 

Eqvtations  (lI-9)  and  (lI-9)  may  be  eqiiated  (M.S.*)  and  written  as  follows; 


y(t)’A'AY(s)-v/  i(t)'  C  $(s) 


(II-IO) 


It  is  shown  in  Addendum  A  that  this  is  equivalent  to  Eq.  (II-7).  Where 

Y(t)  and  J)(t)  are  the  column  matrices  whose  elements  are  the  and  <j> 
respectively, 

A  =  A'  is  the  square  diagonal  M  x  M  matrix  whose  diagonal  elements 
A. .  =  X.  , 

C  is  the  sqxiare  N  x  N,  symmetric  matrix  of  elements  E(C^Cj)  »  , 

Superscript  '  indicates  transpose. 

If  the  X's  are  equal  to  the  n’s,  the  square  roots  of  the  M  eigenvalues  of  C, 
then  Eq.  (II-IO)  is  immediately  recognized  as  the  well-known  formula  for  diagonalizing 
a  quadratic  relation,  where  the  ^ 's  sure  given  by 


Y(t)  =  §  *  Q'  I 


(II -11) 


* 

Indicated  by  . 


II-3 


Q,  Is  of  course,  the  orthogonal  matrix  whose  columns  are  the  eigenvectors  of  C. 

It  can  he  shown  that  Eq.  (II-U)  Is  aOso  a  solution  of  Eq.  (II-7)  as  well  as  Eq.(ll-1( 

Thus,  hy  finding  the  M  eigenvalues  and  eigenvectors  of  C,  a  satisfactory  set  ’ 

of  5^ 's  may  be  determined.  However,  since  we  desire  an  explicit  solution  to  Eq.  (ll-io) 
for  the  expansion  functions  of  Eq.  (lI-4)  in  terms  of  those  in  Eq.  (ll-l)  in  order  to  be 
able  to  evaluate  truncation  errors,  another  approach  is  worthwhile  investigating. 

II .2  EXPLICIT  SOLUTION 

The  ejqpansion  functions  in  (ll-l<^)  are  ^  therefore,  if  these  could  be 

determined  as  explicit  functions  of  the  and  >  then  the  desired  form 

of  the  solution  is  obtained.*  This  can  be  done  if  C  in  Eq.  (II-IO)  can  be  represented 
as  the  product  of  a  matrix,  A,  times  its  transpose  A',  since  then  we  would  have 

Ay(s)-A'{(s)  (11-12) 

Ck)  (5) 

Bodewlg^  flmd  Anderson^^  show  us  that  we  ceui  peurtltion  the  symmetrical  matrix 
C  into 

C  -  (I  +  L)  D(I  +  LV)  -  (I  +  U)'  D(I  +  U)  (11-13) 

where  , 

L  is  a  lower  matrix  (only  elements  below  the  diagonal  are  non-zero), 

U  is  an  upper  matrix  (only  elements  above  the  diagonal  are  ncn-zero), 

I  is  the  unit  matrix 

**  2 

D  is  a  diagonal  matrix  with  det  D  =  det  C  «*  JJ  4 

i-1  ^ 

(l  +  U)'  =  (l  +  L)  for  a  symmetric  matrix. 

Since  D  can  be  partitioned  readily  into  where  is  a  diagonal 

matrix  ^ich  is  defined  such  that  its  elements  are  the  square  roots  of  the  corres¬ 
ponding  elements  of  D,  then  we  can  set 

¥ 

The  Xjn’s  themselves  are  determined  later  as  normalizing  constants  on  the  t^’s. 

In  this  Section  the  X^’s  are  not  the  eigenvalues  of  C  as  in  Eq.  (ll-ll).  tRie  rela¬ 
tionship  between  the  two  are  discussed  in  Section  II .2. 


II-4 


A»  =  (I  +  L')  =  //eT*  (I  +  U) 


(11-14) 


*  or 


A  =  (I  +  L)  '\fv  =  (I  +  U)’  >{5 


with 

r-  ^ 

det  A  =  det  Vd  »  TT  a,,  ;  (see  Equation  (11-12)) 
1*1  ^ 

Thus^  Eq.  (II-I2)  may  be  written  explicitly  slb 


n=N 

i|r  (t)  X  /vj  )  a  ^  (t)  :  m  »  1, 

’m'  '  m  mn  ’^n'  '  *  *  ’ 

n=m 


(11-15) 


where  the  a  's  are  the  elements  of  A*, 
mn 

The  upper  value,  N,  on  m  is  imposed  by  the  truncation  of  the  initial  series 
expansion,  Eq,  (ll-l).  Hopefully,  the  series  in  Eq.  (lI-4),  may  be  further 
truncated  so  that  M  <  N,  The  degree  of  additional  truncation  may  be  determined 
froin  the  numerical  evaltiation  of  Eq.  (II-I5). 

X^  is  determined  from  Eq.  (II-I5)  emd  the  normalizing  relationships  Eq.  (II-6) 
emd  Eq.  (II-3) 


N 

=  N^(A  );  m-1, 

m  mn  '  m.  '  '  '  — 

n>aa 


(11-16) 


* 

Note  that 

N  N  N  N 

I  ~  I  I  4  ■  '  I  “m.  ; 

m=l  m=l  n^  n=»l 

where  N  is  the  norm  and  Sp  is  the  spur  or  trace  of  the  matrix. 


II-5 


Then 


t„(t) 

n 


nain 


n*4n 


’n»N 

1/2 

y  a^ 

z_. 

n<mi 

\  iM 


in  ■  1,  •  •  •  ,M  <  N 


(11-17) 


vhere  A  Is  a  rov  matrix  of  elements  a  (with  m  fixed),  N  is  the  norm  of  a 
m.  nn 

matrix 


Note  that  Eq.  (II-I6)  resvilts  from  the  orthonoimality  conditions  on  the  i|f  's 

m 

and  To  the  author's  knowledge,  relation  Eq,  fll-l6)  does  not  appear  explicitly 

in  the  literature,  although  it  utilizes  well-known  results,  since  most  of  the  lit¬ 
erature  refers  to  matrices  without  conditions  Eq.  (II-3)  and  Eq.  (II-6).  As  a  re¬ 
sult,  Eq.  (II-I7)  appears  to  be  a  new,  and  for  our  purposes  a  very  useful,  result. 

Ihe  author  would  like  to  be  appraised  of  the  availability  of  this  result  elsewhere. 

In  Addendum  B  we  have  derived  the  following  explicit  expressions  for  the  a  's . 

mn 

Form  I:  The  a  's  are  the  ratios  of  the  following  detezmincmts  of  m  x  m 
mn 

matrices  of  appropriate  C, , 's . 

a'  IC  C  ..*0  Cl 

mn  I  11  22  m-1  m-1  mn 

Qt  «  "  I  ^  ^  I  -  a  -  I  ..  ■  I ,  ■  ■  ,  ^  ^  , 

A  fa’  a’  ,  7  aIic..  C„...C  ,  Tc  1  c,,  ,  TT 

^  mn  m-1  m-1  11  22  m-1  m-1  nm  I  11  22  m-1  m-l| 


(11-18) 

Where  the  notation  introduced  in  Eq.  (II-18)  indicates  the  determinant  of  an 
m  X  m  matrix  whose  diagonal  elements  are  those  shown,  i.e, 


II -6 


nn 


'11 

Q  •  •  • 

12 

^lm-1 

"in 

’21 

• 

C  •  •  • 

22 

• 

• 

• 

"2n 

• 

• 

•  • 

•  • 

• 

• 

• 

C  -1  I 
m-lm-1 

• 

'ml 

•  • 

C 

mn 

(11-19) 


Only  values  of  for  which  n  >  m  are  required. 

Form  II:  An  alternative  and  perhaps  ccnputationally  more  useful  set  of  itera¬ 
tive  expressions  for  the  a  's  are  given  by  the  Gaussian  or  Gauss  Doolittle  algo- 
/ 1 \  \  mn 

rithm.^  reference  (4)  many  computational  extensions  and  variations  to  the 

procedure  are  also  given.) 

In  this  case 


o(N-l) 


mn 


(11-20) 


»  where  is  an  iterated  matrix  element, 

mn 

Note  that  for  £q.  (II-I7)  ve  don't  need  the  denominators  of  Eq.  (II-I8)  and 
(II -20).  Thxxs,  if  the  X^'s  were  not  required  ejqpllcitly,  we  could  8iJiq>lify  our 
confutations  by  Just  computing  the  numerators. 

In  our  problem,  the  C  matrix  is  symmetric,  all  >  0;  and  Ij^J. 

Note  that,  from  Eq.  (II-I7),  it  is  seen  that 


Vt)  = 


(11-21) 


Appeirent  resemblence  to  a  Gram-Schmldt  orthogonalizatlon  procedure  is  to  be  noted. 

We  have  detezmined  a  method  for  approximating  the  tj^'s  and  X^'s  explicitly 
in  terms  of  the  's  *  l/****^*  from  this  a  straightforward  compu¬ 

tation  and  error  analysis  can  be  made. 
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II. 3  SUGCaSTIONS,  INTERPRETATIONS,  AND  INCLUSIONS 

It  is  suggested  by  this  paper  that  the  computations  be  undertaken,  for  only 

by  inserting  numerical  values  can  the  goodness  and  degree  of  improvement  offered  by 

the  new  series  be  determined.  The  values  of  the  X  's  (Eq.  (II-I6))  should  decrease 

m 

with  m  either  in  the  ordering  of  the  X  *s  suggested  by  Eq.  (II-I6)  or  by  reordering. 

m 

Althoxi^di  the  writer  did  not  have  time  to  prove  this  for  the  present  work,  it  is 

believed  that  the  ordering  of  the  X*s  can  be  established  before  evaluating.  The 

behavior  of  the  a  's  will  indicate  the  validity  of  the  N-term  truncation  Eq.  (II-I6) 
mn 

as  an  approximation  to  X  .  The  same  may  be  said  for  Eq.  (II-I7)  and  \|r  (t).  Errors 

20  ID 

in  the  new  series  expansion,  Eq.  (ll-lf)  are  due  to  two  factors:  errors  in  the  N- 

term  expansions  of  X  and  *  and  those  caused  by  truncating  M  (so  that  M  <  N).  The 

m  m 

latter  errors  (with  respect  to  Eq.  (ll-l)  as  a  steuidard)  may  be  computed  once  we 
are  satisfied  with  the  accuracy  of  the  N-term  approximations  to  the  X^’s  and  tjjj's. 
Since  the  new  fiinctlons  are  linear  combinations  of  the  old  ones,  as  given  in 
Eg.  (II-I7).  we  are  guaranteed  of  at  least  as  good  an  approximation  using  Eg.  (II-4) 
as  with  Eg.  (ll-l)  for  M  »  N.  Thus  the  N-term  approximation  to  the  X^*s  and 
can  result  in  no  worse  than  as  good  a  representation  as  the 

To  reiterate  a  contention  made  earlier,  it  is  believed  that  results  of 
Eqs .  (II-I6)  and  (II-I7)  are  new.  Comments  on  this  point  are  solicited. 

Note  that  although  emphasis  has  been  placed  on  the  explicit  solution  to 
Eq.  (II-IO)  given  by  Eqs.  (II-I6)  emd  (II-I7),  the  eigenvalue  solution,  Eq.  (ll-ll) 
is  Just  as  valid  a  solution.  Since  the  two  solutions  are  not  identical  and  both 
are  valid,  it  is  observed  that  Eq.  (II-IO)  admits  of  at  least  two  solutions.  The 
relationships  between  thqse  are  discussed  in  Appendix  III. 
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ADDENDUM  A 


EQUIVAIMCE  BETWEEN  SOLUTIONS  OF  THE  KARHUNEN-LOEVE 
EQUATION  AND  BILINEAR  FORMUIA 


As  an  initial  step,  expand  \|r  (t)  in  terms  of  il^^(t),  and  ^At)  in  terms  of 

m  n  n 


ijt): 


N 


♦_(t) 

in 


I  V 

N  -•  00  ^ 


n*l 


(11-22) 


M 

^(t)  ^ 

M  -•  00  T 
m«l 


a  <  t  <  b 


(11-23) 


Now 


U 

b  «  /  t  (t)  ^  (t)  dt 

mn  J  ^m  ^m 


(lI-2lMi) 


nm 


Q 

I 


m  m 


(Il-2irt)) 


therefore 


b  »  d 
mn  nm 


(JI-24c) 


Now  apply  Eqs.  (lI-9),  (II-22),  and  (II-3)  to  (II-8)  to  obtain  a  solution  to 
the  Karhunen-Loeve  equation: 


L.I.M. 
N  -•  00 


m 


N 

I  ’’m 


n«l 


L.I.M. 
N  00 


N  N 


n-1  J-1 


(11-25) 


11-9 


Multiply  both  sides  of  Eq.  (II-25)  by  Eq.  (II-22)  and  integrate  again  over  t, 
apply  Eq.  (11-24)  to  obtain 


N  N 


xl  -  L.I.M.  y  y  E(c  C.)  b„,  b^^ 
“  N  -  00  A  n  J  mj  mn 


(11-26) 


n«l  J»1 


where  we  have  used  Eq.  (II-27)  which  is  d.erived  from  Bqs.  (II-5)  and  (II-22) 


N 

L.I.M.  y 

N  -  00 

n»l 


b^  «  1 

Tirni 


(11-27) 


Consider  now  the  result  obtained  by  equating  (II-8)  and  (II-9);  applying 
Eqs.  (II-23)  to  (II-9)  we  obtain 


L.I.M.  y  X^*(t)t  (8) 
N-  00  ^  “  “ 


I  N  M  M 


I  I  I  I  'j’v 


Multiplying  each  side  by  t,|^('t)  integrating  over  t  and  s;  ,  and 

applying  Eqa.  (II-6)  and  (ll-24c)  provides 


H  N 


-  L.I.M.  y  )  B(C  cjb  b  . 
“  N-oo  A  A  '  n  y  am  »J 


(11-29) 


J-1  n-1 


inie  equivalence  of  Eqs.  (11-26)  and  ( 11-29)  proves  the  equivalence  of 
Eqs.  (II-7)  and  (II-IO). 
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ADDEMDUM  B 


DERIVATION  OF  FORMS  FOR  a 


mn 


FORM  I  FOR  a 


nn 


Bodewig^^^  defines  a  matrix  according  to 
■  C  -  ' 


-  X  y  * 
*i  ^i-1  ri 


-  I  ■'/j  ■ 


J-1 


N 

4"’ '  °  -  4^1  -  vk'  -  =  - 1  vj 


so  that 


(11-30) 


N 


I  Vi ' 


J=1 


(11-31) 


It  can  he  seen  that  x^^  euid  y^'  defined  as  in  Eg.  (II-32)  satisfy  these  relations 

.0  0  ...  Ov 

0*»*  0  •••o\ 


-1) 


Cii  ...  0 


(i-1)  ■  / 

'"Si  •'•0/ 


(11-32) 
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l.e.,  has  components 


(i-l) 

11 


,  J  >  i 


where  are  the  Jl^^  components  of  Likewise*  has 

components 


J>i 

.(1-1) 

'll 
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therefore  for  symmetrical  C,  =  y^  . 

N 

It  can  then  be  seen  that  ^  forma  a  lower  triangle  matrix  plus 
N 


diagonal  and  ^  ^  upper  matrix  plus  diagonal . 

J-1 


N 


N  H 


I  Vj’  “  I  I  ^j’)  ^ 

)«1  J-1  J-1 


and  therefore, 


N 


^  yj’  -  yF*  (I  +  U)  -  A> 

J-1 


(11-34) 


Utilizing  the  fonnats  on  pp.  86  ff .  of  Bodewig  (and  our  notation),  it  can 
be  seen  that 


Alternatively  Xj^  and  y^  could  have  been  defined  as  s.e^ '  and  e^Se'  according  to 
the  notation  introduced  further  on. 
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=  C 
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Then,  the  c|^^'6  become 
^  J 


^11  ^22  • 

•  •  c  c  , 

pp  ±i 

> 

I'^u  °22 

...  C  1 

PPl 

and  the 

elements 

of  A', 

,  are 

cd-i) 

Ic  C  •••  C 

1  11  22  i-1 

1-1  “iil  . 

^ii 

VFn 

c  ...  c  1 

^'22  ''1-1  i-ll 

I'u'’22 

FORM  II 

FOR  a 

mn 

(7) 

According  to  Bodewig' ' 

D  +  DU 

=  q(2) 

“  Sf-1 

q(2)q(2)  c 

(11-36) 


(11-37) 


i 


where  D  and  U  were  defined  previously  (in  Eq.  (II-I3)  and  what  follows  it)  euid 
is  defined  by 

Qi^^-I+s^e^'  (11-38) 
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is  a  colum  vector  all  of  whose  elements  above  and  Including  the  I*** 
one  vanish  and  e .  ■  is  a  unit  row  vector  with  only  the  l^"  element  occupied,  l.e.. 


*  (if OpOpO,  •  •  ",0) 
eg'  =  (0,1,0,0,  ...,0) 


Therefore  s^e.  '  is  an  N  x  N  matrix  with  zero  elements 
throng  N  elements  of  the  i^^  column. 


everywhere  but  the  i 


+  1 


th 


The  8^  are  explicitly  column  vectors  whose  elements  are 

c(i-l) 

"  f,?i-l)  ^  P  “  1  +  1,  . 

^ii 


(11-39) 


.hare  hare  the  sra  the  pi"-  elements  of 

Specifically  this  method  gives  us  an  iterative 

Since  the  elements  of  D  are  then 

body  of  this  paper,  the  a^  's  are  given  by 


Q 


(2) 

'i-l 


with  C°  =  C 
1  pi  pi  • 


procedure  for  finding  D  +  CU. 
according  to  Eq.  (11-14)  in  the 


a 

mn 


(11-40) 


and  our  iterative  procedure  can  provide  a  directly, 
problem,  from  (lI-4o), 


Also,  in  our  particular 


det  D 


N 

TT  c 


1=1 


(N-l) 

ii 


N 


TT 

1=1 


(II- 41) 
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ADDENDUM  C 


REUTIONSHIP  BETWEEN  TWO  METHODS  OF  DIAGONALIZATION 

A  symmetric  matric  C  may  be  represented  as  the  product  of  two  matrices  where 
one  is  the  transpose  of  the  other  in  an  infinite  number  of  ways,  all  related  by 
orthogonal  matrices.  Thus,  if  A  is  determined  by  the  method  of  Section  II .2,  then 
the  eigenvalues  of  C,  the  u^’s,  m£^  be  determined  from  the  eigenvalue  Eq.  (ll-l»2) 
as  well  as  (II-45). 

PA'  X  «  u  X  ( 11-1*2) 

where  P  is  an  orthogonal  matrix  chosen  to  make  Eq.  (11-1*2)  hold.  The  u*s  them¬ 
selves  are  determined  by  the  seculco*  determinant 

|PA’  -  uj  »  0  (11-1*3) 


fluid 


AP'  PA'  -  AA*  «  C 


(11-1*1*) 


» 


Recalling  that  the  relations  for  C  cure 

2 

Cx  -  u  X 
|C  -  u^l  »  0 

where  x  fluid  u  flu:e  the  sflone  flis  in  Eqs.  (II-1*2)  fluid  (ll-l*3).* 

The  elgenvsdues  of  the  trltuigular  plus  diagonal  matrix  A*  are  t|,  given 


lA'  -  Til  -  0 


(11-45) 

(11-1*6) 


(11-47) 


*  (0) 

The  necessary  euid  sufficient  condition  for  this  Is  that  PA'  fluid  AP'  conoiute.' 

(it  follows  then  that  these  also  ccranite  with  C.) 
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or 


N 


TT  -  t)  •  0 


i-l 


In  fact,  then,  and  the  eigenvalue  equation  for  A  becomes 


A'y  «  n  y 


(Il-i48) 


or 


y'A  *  y  T) 

Now,  we  cannot  write, 

^  2 

Cy  =  n  y 


since  A  emd  A',  as  determined  In  Section  (II.2)  do  not  commute,  i.e., 

A'A  ^  A  A'^®^  . 

However,  we  can  obtain  by  premultiplying  both  scales  of  Eq.  (II-I48)  by  P. 

ux  -  PT,y  (11-49) 


and  multiplying  each  side  by  its  transpose. 

u^x'x  «  T)^y'y  (11-50) 

Since  the  y's  sure  not  necessarily  orthogonal  we  cannot  write 

X  M^X*  =  y  N^y  (II  -51) 

where  X  and  Y  are  matrices  whose  columns  are  the  eigenvectors  x  and  y,  respectively, 
and  and  are  the  diagonal  matrices  whose  elements  are  the  u^  euid  terms .  In 
fact,  as  we  know 
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XM^X*  -  C  «  A  A'  -  (y‘^)’  NY’Y  N  Y"^ 


»  (Y')“^  NY’Y  N  Y‘^ 


where,  althoxigh 


Y’  /  Y"^  ,  (X"^)'  -  (Y*)"^ 


Note  that  since  A  and  A*  do  not  ccenmite 
Y  N  Y"^(Y"^)'NY'  i  C 

Consequently,  the  eigenfunctions  of  A  or  A’,  >  *^re  not  useful 

obtaining  a  diagonal  form. 

(9) 

However,  it  is  useful  to  point  out  the  following  relationships;'*^ 
det  C  »  det  AA’  -  (det 


.  det  B  .  tt  -  TT  *11  -  rr  1?  )<  TT  ^1 

i-1  ^  i-1  i-1  ^  1-1  ^ 


where  are  the  constants  determined  in  Eq.  (II-I7),  euad 


N  N 


Sp(c)  «  Sp(AA’)  -  Sp(A’A)  -  H^(A’)  "  ^  ^ 


on 


m-1  n^ 


N  N  N  N  N 

I  “tn  -  I  -  I  *  I  I 

n-1  1-1 


nn 


i-1  «-l  n»«+l 


a  a 

Z  ^  Z  ’’i  " 


■-1  i-1 


(11-52) 


(11-53) 

for 


(11-51^) 


)  (11-55) 
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New  if  we  consider  as  a  measure  of  the  goodness  of  the  finite  series  fit,  the 
time-average  of  the  mean  square  difference  between  the  complete  series  emd  the  trun¬ 
cated  series  <^>  then  for  the  truncated  form  of  the  sense  ^diose  terms  are 

given  by  Eqs .  (II-I6)  emd  (II-I7)  is 


00  M 

'  I  °  ■  I  ’■I 

m^f+1  m=l 

N  N 

-  <•'("-»))  -  I  =nn  I  • 

n=l  m*il+l 

For  the  eigenfunction  series  we  have  for  em  M  term  series 

<^)  -  -  I  *  I 

n=l  i**l+l 

For  the  original  series  (ll-l),  we  have 

N  ^ 

<5®)  -  {K(t,t))  -  ^  +  Y,  °JJ 

n-1 


(11-56) 


(11-57) 


(11-58) 


(it  is  importemt  to  remember  that  the  X's  and  u’s  are  computed  from  the  N  x  N 
matrix  of  C's.  We  then  truncate  to  M  <  N  terms.  If  they  were  coanputed  freu  the 
corresponding  M  x  M  matrix  of  C’s,  they  would  all  have  the  same  error  as  noted 
below  for  M  *»  N.) 

Consequently,  the  relative  goodness  of  each  series  in  truncation  is  measured 
by  the  smallness  of  the  respective  terms. 


N  N  N 


m>Mfl  1-Mfl  J-Mfl 


(11-59) 
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How,  since  a  is  em  element  of  A'  and  C  ■  AA',  it  can  be  seen  that 
'  mn 


longer  of  J  or  i 

I 

m«l 


a  .  a  .  ,  and  C  „ 
mj  mi  '  nn 


Then, 


N  n 


N  Mfl 


N  N 


I  14.-  I  I  4*  I  I 


^  nn  Z_i  4_j  ““  L-> 

n^Mfl  n«44+l  m*!  n*eB+l  m“l 


n«<fl  non 


N  M+1 


I  I  I  • 


n«Mfl  m>l 


m^4(fl  m^+1 


(11-60) 


A  possible  decrease  in  time-average,  mean-truncation  error  is  gained. over  the 
by  the  fs  of  Eq.  (II-I7).  However,  if  the  value  e®  (not  time-averaged  but 
the  point -by-point  mean  error  is  used),  as  a  criterion  an  even  greater  decrease  is 
possible  due  to  the  appearemce  of  cross  products  in  the  ^  expression.  To  wit: 


M  M 


H  N 


^  -  I  I  “nj  >’/*>  -  =  Z  Z  "nj 


n»l  J»1 


n-1 


(11-61) 


K(t,t)  -  Z  *  Z  4*) 


(11-62) 


i-*h-l 


In  order  to  coiipare  the  u  expression,  with  more  than  the  following  mode 
approximation,  it  appears  that  more  has  to  be  known  about  the  eigenfunction,  l.e.. 
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(11-63) 


N  N  N  N 

I  1 1  z -f’ 


i=M+l  i^tfl  p«l  q=l 

Since  C  >  C  ,  p  q,  then 
PP  -  pq’ 

N  N  /N 

I  I  (I  Vv-i” 

i»Mfl  i-Hfl  \p»l 


(11-64) 
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APPENDIX  III 


PROGRAM  IDENTIFICATION  -  SE002 


I 

I.  PURPOSE 

To  extract  characteristic  pitch  periods,  analyze  and  reconstruct  them,  and 

► 

"plot"  the  original  and  reconstructed  points. 

II .  METHOD 

After  the  orthonormal  functions  have  been  read,  the  program  begins  reading 
the  input  tape.  From  each  record  a  pitch  period  is  extracted,  the  bounds  of 
which  are  specified  on  parameter  cards.  (For  the  same  reasons  as  in  the  previous 
program,  SEOOl,  the  parameters  were  assembled  with  the  program.)  The  pitch  points 
are  individually  converted  to  floating  point  and  this  new  form  of  the  pitch  per¬ 
iod  is  analyzed  using  the  orthonormal  functions  such  that  thirty-two  coefficients 
result.  Besides  being  written  out  with  two  identification  words  as  a  separate 
output,  these  coefficients  are  used  to  reconstruct  the  speech. 

As  each  point  in  the  pitch  period  is  reconstructed,  the  original  and  the 
reconstructed  points  are  "plotted"  on  a  scale  of  ten  to  one  in  a  BCD  cxitput  area. 
The  original  points  are  plotted  as  periods,  and  the  reconstructed  points  as 

asterisks,  If  a  reconstructed  point  exceeds  1023  or  is  negative,  an  "x"  is 

,  plotted  in  the  highest  or  lowest  plot  position  respectively.  A  "+"  indicates 

that  the  scaled  original  and  reconstructed  points  are  plotted  in  the  same  posi¬ 
tion. 

When  wenty  pitch  periods  in  each  of  five  files  have  been  handled,  the  pro¬ 
gram  prints  a  message  on-line  and  stops. 

III.  DATA  REQUIREMENTS 
A.  Parameters 

1.  Four  parameters  are  required  for  each  pit<rti  period. 

2.  The  first  parameter  appears  as  a  "BCI"  card  containing  the  time  in 
BCD,  three  characters  for  seconds  and  three  for  milliseconds. 

*  These  must  match  the  time  on  the  portion  of  the  input  record  in 

which  the  pitch  period  begins. 
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3-  The  second  parameter  indicates  the  actual  starting  point  of  the  pitch 
period.  It  takes  the  fom  of  a  TXI  instruction  with  a  fixed  address, 
"XZCOl",  fixed  tag,  "k”,  and  a  negative  variable  decrement  which  is 
the  actual  number  of  the  beginning  point. 

If.  The  third  pareuneter  is  similar  to  the  first  except  that  it  contains 
the  time  of  the  portion  of  the  input  record  in  which  the  pitch  per¬ 
iod  ends. 

5.  The  fourth  parameter  is  similar  to  the  second  but  it  defines  the  end 
point  of  the  pitch  period.  It  takes  the  form  of  a  TXI  instruction 
with  a  fixed  address,  "XEC02",  fixed  tag,  "h",  and  a  positive  vari¬ 
able  decrement  which  is  the  actual  number  of  the  end  point  of  the 
pitch  period. 

6.  The  program  is  written  to  handle  five  files  of  twenty  pitch  periods 
each,  hence  there  are  five  sets  of  eighty  parameters  each  or  a 
total  of  four  hundred. 

7.  As  in  SECX)1,  th^narameters  were  assembled  with  the  symbolic  deck 
in  the  interest  of  speed. 

8.  Changes  to  any  one  parameter  or  set  of  same,  is  discussed  under  the 
modifications’  section  of  this  report. 

B.  INPUT 

1.  Orthonormal  Functions  Tape 

a.  This  input  is  on  a  low  density  magnetic  tape,  created  in  binary 
mode. 

b.  There  is  one  file  of  data. 

c.  The  file  consists  of  thirty- two  301"word  records. 

d.  Bach  function  is  represented  in  floating  point  binary. 

2.  Pitch  Point  I&pe 

a.  The  input  data  is  on  a  high  density  magnetic  tape,  created  in 
binary  mode  by  the  program  identified  as  SEOOl. 

b.  There  are  five  files  of  data  on  the  tape. 

c.  Bach  file  contains  twenty  305-word  records. 

d.  Bach  30^-word  record  can  be  subdivided  into  five  6l-vord  sub¬ 
records. 
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e.  Each  subrecord  has  as  its  first  word  a  BCD  representation  of  time, 
three  characters  for  seconds  and  three  for  milliseconds. 

f.  Bach  of  the  remaining  sixty  words  contains  a  pitch  point  in  binary, 
which  range  from  zero  (O)  to  one  thousand  and  twenty- three  (1,023). 

g.  Hie  pitch  points  are  right  justified. 

C.  OUTPUT 


1.  Coefficients 

a.  The  coefficients  are  written  on  a  high  density  magnetic  tape. 

b.  There  are  five  files  each  containing  twenty  3^-word  records. 

c.  The  first  two  words  of  the  thirty- four  are  identification  words. 
Word  one  contains  the  file  number  in  the  decrement  portion,  and 
the  number  of  points  in  the  analyzed  pitch  period  in  the  address 
portion.  Word  two  contains  the  BCD  representation  of  the  time 
of  the  record  in  which  the  pitch  period  began. 

d.  Words  three  through  thirty- four  contain  thirty- two  coefficients 
in  normalized  floating  point. 

2.  "PLOT" 

a.  The  "plot”  is  on  a  high  density  magnetic  tape,  written  in  BCD 
mode. 

b.  It  contains  one  file  of  blocked  records,  five  22-word  records 
per  block. 

c.  There  is  a  header  record  for  each  pitch  period.  It  contains  the 
file  and  pitch  period  numbers,  as  well  as  the  time  of  the  input 
subrecord  in  which  the  pitch  period  begins. 

d.  The  header  record  is  followed  by  32  coefficient  records.  The 
odd  numbered  coefficients  have  in  the  same  record  the  square 
root  of  the  sum  of  the  squares  of  the  odd  numbered  coefficient 
and  the  next  even  one. 

e.  The  coefficients  are  followed  by  as  many  records  as  there  are 
points  in  the  pitch  period.  The  fozmat  for  these  records  is 
as  follows: 


Position  Number  Contents 


1-3  pitch  point  number 

7-10  original  pitch  point 


13  -  16 


reconstructed  pitch  point 


25  -  127  the  "plot"  of  the  original  and  recon¬ 

structed  pitch  points,  for  the 
original,  for  reconstructed,  "+" 
if  both  are  in  the  same  position,  "X" 
if  off  scale. 
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3. 


Unreadable  Records 


Any  record  which  cannot  be  read  is  reread  10  times.  If  the  reading 
still  fails  the  record  is  written,  as  is,  on  the  unreadable  records 
tape.  This  operation  is  a  fixed  function  of  the  l/O  pacloage. 


IV.  MACROS 

All  macros  used  are  part  of  the  Input-Output  package  written  by  a/2c  Robert 
Barger  at  the  Air  Force  Intelligence  Center,  Washington,  D.  C.  These  include: 

QFIIil  -  a  macro  to  define  the  files  and  set  certain  options  in  the 
program . 

QOFEN  -  a  macro  to  initialize  the  l/O  system. 

QWRITE  -  a  macro  to  write  logical  records  on  tape. 

QFINSH  -  a  macro  to  finish  writing  a  block  initiated  by  QWRITE. 
and  QCLOSE  -  a  macro  to  wind  up  all  l/o  functions. 

V.  SUBROUTINES 

A.  lORDWR  -  a  subroutine  in  the  l/O  package  to  read  or  write  tape  depend¬ 

ing  on  the  calling  sequence. 

B.  BINDEC  -  a  subroutine  developed  at  AFIC  to  convert  binary  to  BCD. 

C.  SORT  -  a  subroutine  currently  available  as  part  of  FMS  to  take  square 

roots . 

VI .  MACHINE  CONFIGURATION 

A.  32K  7090  with  two  channels  A  and  B 

B.  ^  tape  drives  on  Channel  A 

C.  4  tape  drives  on  Channel  B 

D.  1  card  reader 

E.  1  printer 

VII.  OPERATING  INSTRUCTIONS 

A.  Load  IB  Monitor  (SOS)  system  tape  on  A1 

B.  Load  orthonozmal  functions  tape  on  b4 

C.  Load  pitch  point  data  tape  on  a6 

D.  Load  blanks  on: 
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A2  -  required  by  the  system 

A4  -  binary  output  -  coefficients  tape 

AT  -  unreadable  records  tapie 

B1  -  required  by  the  system 

B2  -  required  by  the  system 

b6  -  BCD  output  -  "plot"  tape. 

E.  Put  squoze  deck  in  card  reader  hopper  and  ready. 

F.  Put  sense  switch  1  on,  all  others  off. 

G.  Clear  memory. 

H .  Load  tape . 

I.  At  END- OF- JOB 

1.  return  system  tape  (Al),  and  orthonormal  functions  tape  to  programmer. 

2.  return  binary  input  (a6)  to  storage  (optional) 

3.  label  and  store  binary  ou.tput  (A4) 

4.  if  no  input  records  were  unreadabJ.e,  release  AT  immediately.  If 
there  were  unreadable  records,  releasing  is  at  the  option  of  the 
programner. 

5.  list  BCD  "plot"  tape  (b6)  on  1401  with  special  program  which  prints 
blocked  records. 


VIII .  ERROR  MESSAGE 


t 


Message  on  Line 


A.  "ILLEGAL  EOF  ON 
ORTHONOBMAL  FUNCT- 
TIONS  TAPE." 


B.  "ILLEGAL  EOF  ON 
INPUT  TAPE." 


C.  "INPUT  TIME  HIGHER 
THAN  SELECTION 
TIME." 


Cause  Corrective  Action 

1.  )  This  stop  should  never  happen. 

2.  )  Check  mod  package  to  make  sure 
nothing  is  interfering  with 
this  macro. 

3-)  If  macro  is  O.K.,  regenerate 
the  tape  and  reran. 

stated  1.  )  llie  input  tape  is  requii*ed  to 

have  five  files,  each  contain 
ing  20  305-word  records. 

2 . )  Check  the  input  tape  and  re¬ 
generate  if  necessary. 


stated  in 
message 


The  time  on  the  1. ) 
input  i«cord  is 
higher  than  the 
selection  time  p  \ 
specified  for  ’ ' 

the  beginning  of 
the  selectioa. 


Check  the  mod  package  to 
make  sure  there  are  no  over¬ 
lapping  alter  caius. 

Check  for  keypunch  errors  in 
the  paiameters.  If  so,  correct 
and  reixin. 
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Message  on  Line  Cause  Corrective  Action 

3. )  Check  listing  of  previous  pro¬ 
gram  to  make  sure  the  beginning 
and  end  points  are  actually 
times  of  the  records  on  the  in¬ 
put  data  tape.  Correct  and  re¬ 
run. 


D. 


E. 


"NO  INPUT  TIME 
MATCHES  CURRENT 
SELECTION  TIME." 

There  is  no 
match  in  the 
input  record  for 
the  selection 
time  specified. 

Seune  as  for  C. 

"IN  TIME  HIGHER 

THAN  END  SEI£C- 
TION  TIME." 

The  time  on  the 
input  record  is 
higher  than  the 
time  specified  for 
the  end  of  the  se¬ 
lection. 

Same  as  for  C. 

"TROUBLE  ADVANCING 

TO  NEXT  FIIE. 

ILLEGAL  EOF. " 

An  extra  end  of  1. ) 
file  is  detect¬ 
ed  somewhere 

between  files.  -  v 

2. ) 

Check  for  multiple  tape  marks 
between  files  on  the  pitch 
period  tape. 

Regenerate  if  necessary. 

K.B.  If  necessary,  a  system  Dump  may  be  produced  by  manually  transferring  to 
112j^q  or  l60g  . 

IX.  modifications 

Modifications  may  be  made  to  any  single  parameter  in  one  of  the  following 

ways : 

A.  Changing  the  "beginning”  time: 

Find  in  the  listing  the  "beginning  time"  to  be  changed.  j^Symbolic  ad¬ 
dress  "BANDE"  +  60  (File  niunber  -l)  +  4  (Selection  number  -l)J  .  This 
will  appear  as  .BCI  l,xxxxxx,  with  x's  equal  to  the  time  in  seconds  and 
milliseconds.  Find  the  alter  number  associated  with  it.  This  is  in 
the  column  to  the  left  of  the  symbolic  address  field  in  the  listing. 
Punch  the  alter  number  twice  in  succession  separated  by  a  ccmna.  Punch 
a  new  BCI  card  replacing  the  old  time  with  the  new.  Put  the  change  in 
mod  package  and  run. 
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e.g.  To  change  the  beginning  time  of  third  selection  of  file  2: 

BANDE  +  So  (2  -  1)  +  4(3  -  1)  -  BAMDE  +  88  or  alter  number 

1835. 

Punch 

Column  8  16 

ALTER  1835,1835 

BCI  l,YYYYyY 

where  the  Y's  represent  the  new  time. 

Changing  the  number  of  the  first  point. 

Find  in  the  listing  the  point  to  be  changed.  QaANDE"  +  80  File  number  -  l) 
+  ^(Selection  number  -  l)  +  ij  .  It  will  appear  as  a  "TXI  XECOl,  4, 
-X."  where  "X"  is  the  point  number.  Find  the  associated  alter  number. 

Punch  the  number  in  an  alter  card.  Punch  a  new  TXI  card  replacing  the  old 
point  number  with  the  new.  Put  the  two  cards  in  mod  package  and  run. 
e.g.  To  change  the  beginning  point  number  of  the  third  selection  of 
file  2: 

BAMDE  +80  (2  -  1)+4(3-  1)+1«  BANDE  +89  or  alter  number 

1836. 

Punch 

Column  8  16 

ALTER  1836,1836 
TXI  XECOl, 4, -y 

where  Y  represents  the  new  point  number. 

Changing  the  end  time  is  the  same  as  changing  the  beginning  time,  but 
add  2  to  the  alter  number. 

e.g.  To  change  the  end  time  of  the  third  selection  of  file  2: 

Punch 

Column  8  I6 

ALTER  1837,1837 
BCI  1,YYYYYY 


where  Y's  represent  the  new  time  in  seconds  and  milliseconds. 


D.  Cbaaglng  the  number  of  the  end  point  is  analogous  to  changing  the  number  of 
the  first  point  except  the  alter  number  is  2  more  than  the  one  for  the  be¬ 
ginning  pointy  the  address  is  XEC02,  and  the  decrement  is  positive. 

e.g.  To  change  the  end  point  number  of  the  third  selection  of  file  2: 

Punch 

Column  8  l6 

ALTER  1838,1838 
TXI  XEC02A,y 

where  Y  represents  the  new  point  number. 

E.  If  the  set  of  four  parameters  for  a  pitch  period  is  to  be  changed,  the 
individual  parameters  would  be  punched  according  to  the  format  above. 

The  four  alter  cards  would  be  replaced  by  one,  punched  as  follows  for 
the  preceding  example. 

Column  8  16 

ALTER  1835,1838 
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APPENDIX  IV 


PROGRAM  IDENTIFICATION  -  SEOOl 


I.  PURPOSE 

To  select  sets  of  records  frcm  five  sound  streams,  each  set  containing  one 
or  more  representative  pitch  periods,  and  to  plot  the  pitch  points  contained 
therein. 

II.  METHOD 

Since  the  program  is  tailor  made  for  a  particular  input,  a  brief  description 
of  this  input  is  in  order.  Five  people  were  recorded,  each  one  saying  ten  sounds 
and  then  repeating  them.  Hie  sounds  were  digitized  and  the  resultant  record  placed 
on  magnetic  tape.  Hiis  tape  is  the  program  input.  Ihe  parameter  -  -  the  numbers 
of  the  records  in  which  the  sponsor  was  interested  -  -  were  assembled  with  the 
original  symbolic  deck  in  order  to  avoid  the  necessity  of  reading  a  parameter  tape. 

Before  a  record  is  read,  the  record  counter  is  checked  to  see  if  it  is  of 
any  interest.  If  it  is  not,  the  counter  is  incremented  by  one,  the  record  effec¬ 
tively  skipped,  and  the  next  record  considered.  If  it  is  a  selected  record,  a 
header  consisting  of  file  number,  selection  number,  Md  time,  in  seconds  and  milli¬ 
seconds,  is  written  on  the  BCD  output  tape.  Hie  time  in  BCD  is  also  stored  in  the 
first  word  of  the  binary  output  area. 

Each  point  contained  in  the  record  is  treated  separately.  First,  the  point 
is  unpacked  from  the  original  format  of  3  points/word.  If,  when  tested,  it  is 
found  to  be  greater  than  1023,  it  is  replaced  by  1023,  stored  in  the  binary  out¬ 
put  area,  and  a  flag  set  to  indicate  this  condition  on  the  "plot."  Otherwise,  the 
input  number  is  stored  in  the  binary  output  area  and  then  examined  to  determine 
its  "plot"  position.  Hie  "plot"  is  in  reality  a  block  of  103  positions  of  the 
BCD  output  8u:ea  in  which  a  character,  a  period,  is  stored  depending  on  the  magni¬ 
tude  of  the  given  number  after  it  has  been  scaled  ten  to  one.  After  it  has  been 
plotted,  the  number  and  the  running  pitch  count  are  converted  to  BCD  and  both  are 
stored  in  the  output  BCD  record  emd  the  record  written.  This  process  continues 
until  the  20  words,  60  points,  of  input  sure  completed.  Four  more  input  records, 
in  sequence,  axe  added  to  this  group  before  the  selection  search  is  repeated. 

When  20  selections  for  the  first  file  are  processed,  the  input  is  spaced  forward 


to  ccmmence  similar  treatment  of  the  remaining  four  files.  When  all  the  data  is 
completed,  the  output  is  finished  and  a  message  is  printed  on  line. 

III.  lAm  REQUIHEMERTS 

A.  I^rameters: 

IQie  peureuneters  in  this  program  aure  a  little  unusual  in  tliat  they  are  actual 
instructions  assembled  with  the  program.  Die  input  records  were  counted,  begin- 
ing  with  zero,  and  the  counts  associated  with  the  first  record  of  the  sets  of  five 
for  each  selection  were  supplied  to  the  program  as  decrements  of  TXL  instructions. 
Diis  approach  was  used  to  avoid  the  necessity  of  reading  a  parameter  tape  (the  I  0 
package  being  used  excludes  the  possibility  of  reading  paueuneters  frcm  cards)  auid 
to  decrease  the  execution  time  of  the  program.  Die  section  on  modifications  in 
this  write-up  will  illustrate  how  modifications  cam  be  made. 


B.  INPUT 

1.  Die  input  data  is  on  a  low  density,  binaury  mode,  magnetic  tape. 

2.  Diere  must  be  five  files  of  data  on  the  tape. 

3.  Bach  record  is  21  words  long.  Die  first  word  is  in  three  parts. 

Die  first  peurt,  12  bits,  is  an  ID  not  used  in  this  program. 

Die  next  two  peurts  are  time  specifications;  12  for  time  in  seconds  and 
12  for  milliseconds.  Die  remaining  20  words  contain  pitch  points  of  12 
bits  each,  packed  three  jier  word. 

k.  Of  the  12  bit  pitch  point,  bit  zero  is  unused.  Bit  1  indicates  an  over¬ 
flow  condition  if  it  is  one.  Die  remadnlng  10  bits  can  contain  a  number 
equal  to  or  less  tham  1777  In  octal  or  1023  In  decimal. 


C.  OUTPUT 

1.  Blnauy  Output 

a)  De  binary  output  is  a  aagnetic  tape  written  in  high  density. 

b)  It  contsdns  five  files. 

c)  Each  file  contains  20  305  -  word  records. 

d)  Bach  record  is  the  expansion  of  five  consecutive  input  records.  De  * 

first  word  is  the  time  of  the  first  record  the  set  In  BCD,  three 
characters  for  seconds,  and  three  for  milliseconds.  Die  next  60  words 

are  the  unpacked  pitch  points  of  the  first  record,  with  overflow  condi- 
tloxis  removed.  Die  format  is  repeated  for  each  of  the  other  four  records  * 
in  the  set. 
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2. 


BCD  OUTPUT 


a)  The  BCD  output  is  a  high  density  magnetic  tape. 

b)  Ihere  is  one  file  of  blocked  records,  5  22-word  re cords /block. 

c)  A  major  header  record  is  written  once  for  each  selection.  It  contains 
the  file  number,  one  through  five,  the  selection  number,  one  through 
twenty,  and  the  number  of  the  input  record. 

d)  A  minor  header  is  written  for  each  input  record  in  the  selection; 
i.e.  five  per  selection.  It  contains  the  time  in  seconds  and  milli¬ 
seconds  of  the  individual  record. 

e)  Every  minor  header  is  followed  by  60  point  records.  Each  of  these 
contains  the  number  of  the  pitch  point,  the  pitch  point,  and  the  "plot" 
of  the  pitch  point.  Die  number  of  the  pitch  point  occupies  character 
positions  one  and  two  of  the  record.  Die  pitch  point  itself  occupies 
position  seven  through  ten.  Die  "plot"  character  can  appear  anywhere 
from  position  25  through  12?. 

f )  When  an  input  file  is  completed,  a  record  of  asterisks  is  written  on 
the  BCD  output. 


3.  UNREADABLE  RECORDS 

a)  A  record  is  not  deemed  unreadable  until  it  has  been 
ten  times. 

b)  An  unreadable  record  is  written  on  A7  (fixed  by  the 
the  same  form  as  it  is  read. 


♦  IV.  MACROS 

All  the  macros  used  are  part  of  the  Input-Output  package  written  by  A/2c 
,  Robert  Barger  at  the  Air  Force  Intelligence  Center,  Washington,  D.C. 

Diese  include: 

qfILE  -  a  macro  to  define  the  files  and  set  certain  options  in  the  program. 
QOPEN  -  a  macro  to  Initialize  the  I  -  0  system. 

QWRITE  -  a  macro* to  write  logical  records  on  tape. 

QFINSH  -  a  macro  to  finish  a  writing  block  initiated  by  <)HRITE,  and 
QCLOSE  -  a  macro  to  wind  up  all  I  -  0  functions. 


V.  SUBROUTINES 

A.  lORDWR  -  a  subroutine  in  the  10  ftickage  to  read  or  write  tape  depending 
on  the  calling  sequence. 

B.  BINDEC  -  a  subroutine,  developed  at  AFIC,  to  convert  from  binary  to  decimal. 


read  and  reread 
10  package)  in 
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VI.  MACHINE  CONFIGURATION 


A. 

32K  7090  with  2 

channels,  A  euid  B. 

B. 

5  tape  drives  on 

channel  A 

C. 

3  tape  drives  on 

channel  B 

D. 

1  CEurd  reader 

E. 

1  printer 

VII.  OEEHAUNG  INSTRUCTIONS 

A.  Load  IB  >K)NITEIR  SOS  System  tape  on  A1 

B.  Load  input  tape  on  A4 

C.  Load  blanks  on: 

A2  -  required  by  the  system 
A6  -  binary  output  tai)e 
A7  -  unreadable  records  tape 
B1  -  required  by  the  system 
B2  -  required  by  the  system 
B4  -  BCD  output 

D.  Put  squoze  deck  in  card  reader  hopper  and  ready. 

E.  Put  sense  switch  I  on,  all  others  off. 

P.  Clear  memory. 

G.  Load  tai>e. 

H.  At  END-OF-JOB: 

1)  Return  A4  to  A1  to  programmer. 

2)  List  B4  with  specie^  IkOl  print  program  which  prints  blocked  records. 

3)  Label  and  savej^. 

U)  Saving  A7  is  optional. 

VIII.  ERROR  MESSAGE 

A.  "THERE  IS  AN  ILLEGAL  EOF"  will  be  printed  if  8ui  unexpected  EOF  occurs. 

The  final  program  halt  will  indicate  at  which  point  in  the  progran  it 
occurred. 

HTR  10226g  -  ^rtien  a  r-ecord  of  interest  is  being  read. 

HTR  10227g  -  \dien  a  record  of  no  interest  is  being  skipped. 

HRT  10230g  -  When  the  tape  is  being  spaced  forward  after  a  legitimate  EOF 

B.  A  dump  may  be  taken  by  manually  transferring  to  10S31g  ,  or  if  this  falls, 
by  transferring  to  I6O0  (112^q). 


IV-lf 


IX.  MO'DIFICATIOKS 


If  for  any  reason  the  record  number  of  the  first  record  of  a  selected  set  must 
j  be  changed;  thi.s  can  be  accomplished  easily  by  the  following  procedure: 

l)  Find  the  instruction  in  the  listing  which  contains  the  record  number  to 
be  charged. 

»  f )  Punch  in  an  alter  card  (ALTER  in  columns  8  -  12 )  in  colmns  l6  on  the 

alter  number  of  the  instruction.  This  is  found  to  the  left  of  the  symbolic 
i  address  field  in  the  listing. 

I  3)  Repunch  the  card  exactly  as  before^  replacing  the  old  number  with  the 

new. 

>  4-)  Insert  the  two  cards  in  the  mod  package  before  running. 

(  5)  To  chajTge  the  record  number  of  the  fifth  selection  of  file  4: 

!  Column  8  I6 

!  ALTER  1511,1511 

1  TXL  SKIP  1,4,"X" 

where  "X"  is  the  new  number- 
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